M. Adlan Ramly || AI-Vis

AI-Vis: AI Based Multidimensional Data Visualization On Augmented/Virtual Reality Platforms

2017
Researcher & UX Designer
Collaborator(s): Xianling Zhang, Adam Niederer, Yuya Ong, Andrew Cheng
SIGGRAPH 2017 Student Posters Semi-Finalist
uSens Developer Challenge Finalist

Abstract

We propose a real-time data visualization system that enables users to interact with the AI-based interface on augmented/virtual reality platforms. This method differs from the traditional 2D display by offering data distribution in an immersive manner. Users can use gesture and voice commands to retrieve analyzed results and interact with each data point.

Introduction

Traditionally, researchers analyze data with personal computers. This system has inherent limitations; flat, 2D displays cannot meaningfully portray three dimensional data and offers a distant, non immersive perspective for the user. With the advanced development of the VR/AR toolkits, we could improve the immersiveness of data presentation with head mounted VR/AR toolkits. Our approach consists of both backend and frontend work. From the backend,the cloud server computes the data and sends it to the client for visualization. From the frontend, the AI is called by a gesture, then the user can search topics via voice commands. After searching for desired topics, the corresponding data distributions will show up on the head mounted display. The user can select each data point to check the corresponded documents.

Design Approach

When determining gestures, we decided to pick a gesture that seems to be generally intuitive and acceptable. Open palms are universally accepted as a friendly & welcoming gesture. To call the AI assistant, place your palm facing upwards which the AI assistant will pop up on your palm. After calling the AI, you can speak to the AI: "Find a paper related to mixed reality" for example. The AI will recognize the command and it will show you a cloud of data points about papers that are related to mixed reality. To select a data point, we decided to pick the 2-finger pinch gesture with the reason to be consistent with Microsoft Hololens' interaction design. By selecting a data point, you can observe the details about a paper that is related to mixed reality.

Technical Approach

Pre-processing: Le's method of numerical representation of sentences (Le et al, 2014) is used along with a Tipping and Bishop probabilistic principal component analysis method of vector decomposition with whitening into a three-dimensional space (Tipping et al, 1999) to process the documents and place them into 3D space. The documents may then be searched via a streaming Latent Dirichlet Allocation, as described by Hoffman (Hoffman et al, 2010). This produces a scalar similarity index which is represented in the 3D space as proximity to the searching user. This data, along with additional scalar quantities about each document may be acquired over a websocket and are streamed to the Hololens/Oculus in real time.

Run-time processing: The user puts on the headset, and with the command “search”+”keywords”, the cloud server will respond with the dictionary of relevant keywords. After retrieving the dictionary, the client end will visualize the data points with X, Y, Z coordinate positions. An individual data point will not have much meaning on its own, but each cluster of data points will represent a group of documents with similar keywords.

The reason for choosing Python as the backend processing is because it is widely adapted for data analysis with machine learning algorithms. For the frontend, we use C# in Unity because with Unity’s built-in 3D coordinate system, the data visualization could be displayed in a more immersive manner.

Conclusion

We have designed a real-time data visualization system that enables users to interact with the AI-based interface on augmented/virtual reality platforms. We hope this system could be implemented in research studies that involve complex data visualization.

Future Work

The current version of the approach uses research papers as the data resource. We intend to add basic optical character recognition (OCR) features, which will use a head mounted camera with computer vision algorithms to recognize text in real life and retrieve data from it.

Reference

Le, Q. V., & Mikolov, T. (2014, June). Distributed Representations of Sentences and Documents. In ICML (Vol. 14, pp. 1188-1196).
Tipping, M. E., & Bishop, C. M. (1999). Mixtures of probabilistic principal component analyzers. Neural computation, 11(2), 443-482.
Hoffman, M., Bach, F. R., & Blei, D. M. (2010). Online learning for latent dirichlet allocation. In advances in neural information processing systems (pp. 856-864).